Noisy time series generation by feed-forward networks 
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We study the properties of a noisy time series generated by a continuous- valued feed-forward network 
in which the next input vector is determined from past output values. Numerical simulations of 
a perceptron-type network exhibit the expected broadening of the noise-free attractor, without 
changing the attractor dimension. We show that the broadening of the attractor due to the noise 
scales inversely with the size of the system ,N, as 1/yN. We show both analytically and numerically 
that the diffusion constant for the phase along the attractor scales inversely with N. Hence, phase 
coherence holds up to a time that scales linearly with the size of the system. We find that the 
mean first passage time, t, to switch between attractors depends on N, and the reduced distance 
from bifurcation r as t = a— exp(&T_/V 1//2 ), where b is a constant which depends on the amplitude 
of the external noise. This result is obtained analytically for small r and confirmed by numerical 
simulations. 



I. INTRODUCTION 

The application of neural networks to the field of time 
series, covers several areas such as prediction [Q, iden- 
tification and control The problem of time series 
prediction was well studied in the past [|| in the context 
of linear modeling, and later was extended to non-linear 
models. In this paper we analyze a typical class of ar- 
chitectures used in this field in the presence of additive 
noise, i.e. a feed-forward network governed by the follow- 
ing dynamic rule: 



si 



Jt+1 



at 



J = 2, 



(1) 



where S^ ut is the network's output at time step t and S 1 * 
are the inputs at that time; N is the size of the delayed 
input vector. The focus is set on the long-time (asymp- 
totic) properties of the sequences generated by the system 
under the given dynamic rule. The clean model (without 
the additive noise) has been investigated and the 
main results are summarized below. 

Since a realistic time series is noisy, it is imperative 
to understand the effect of noise on the output of the 
model. In this paper, we conduct an extensive quantita- 
tive study of the effect of noise on this particular class of 
model networks. We restrict the analysis to non-chaotic 
behaviour for two main reasons. First, chaotic behaviour 
does not allow long term prediction due to divergence 
of nearby trajectories, though such model networks are 
capable of generating chaotic sequences. Second, non- 
linear complex (however non-chaotic) time series are an 
important subclass which impose interesting questions. 
Hence understanding the relation between such complex 
behaviour and the architecture of the network is crucial 
form the point of view of time series prediction. 

The basis for using time delayed vectors as inputs is the 
theory of state space reconstruction of a dynamic system 



using delay coordinates [|6|,[7| . An architecture incorporat- 
ing time delays is the TDNN - time-delay neural network 
Q , which when operates in the iterative mode contains a 
recurrent loop (as in the model described above, without 
noise). This type of networks is appropriate for learning 
temporal sequences, e.g. speech signal and for short term 
prediction. The model we investigate can be viewed as 
a degenerate form of a TDNN in which the delay-lines 
are restricted to the input layer. Note that the dynamic 
rule (equation |]) corresponds to the closed-loop mode of 
operation used for generating subsequent predictions it- 
eratively once the network has been trained on a given 
time series. Though some work has been done on charac- 
terization of a dynamic system from its time series using 
neural networks, not much analytical results that con- 
nect architecture and long-time prediction are available 
(see M. Mozer in Nevertheless, practical consid- 

erations for choosing the architecture were investigated 
extensively (see [Q and references therein). 

Recently, it has been shown || that an hierarchy 
among the complexity of time series generated by dif- 
ferent architectures exists. This information can be used 
as a guideline for an application in the following way. 
Given a time series one can conclude some quantitative 
measures regarding the complexity of the sequence, e.g. 
the attractor dimension and choose an architecture for 
the prediction task which is high enough in the hierarchy 
to ensure that it is capable of generating such a complex 
sequence. 

Let us review briefly the main findings of the clean 
model. For conciseness we shall refer to the model gener- 
ating the sequence as a SGen - Sequence Generator. The 
simplest SGen consists of a perceptron (Figure |l|) whose 
output at time t, <S* ut , is determined by the input vector 
at time t, 5* , (S* = (S{, . . . , S%)) as follows: 



qt 



tanh(/3J • S 1 *) 



for a fixed weight vector J and gain [3. 
S l is given by: 



o,, 



1 . 



.N 



(2) 

The input vector 
(3) 



1 



i.e. the inputs are chosen to be the output values at the 
previous N times. Thus, starting from an initial state 
Si = Sf . i = 1 ... N the system generates the sequence 



t=l,2. 



as follows: 



/V 



5* = tanh(/3^ Ji5 t_i ) 



(4) 
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FIG. 1. SGen generating a time series. 

In the case of a generic perceptron-SGen, the system 
is attracted into a quasi-periodic (QP) flow governed by 
one of the Fourier components of the power spectrum 
(PS) of the weight vector. Hence, the attractor dimen- 
sion (AD) is one. Denoting the frequency and phase of 
the governing Fourier component by K and 4> respec- 
tively, the corresponding part in the weight vector is 
Jj = Rcos(^-Ki — irtfi) , and the dynamic solution in 
the leading order of N and 1 <C K <C N is of the form: 



S* = tanh 



Acos( — (K 
N 



<l>)t) 



(5) 



The amplitude (A) of this solution depends on the gain 
(P) and the phase (<f>) in the following way: 



N/3- 
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(6) 



where B p are the Bernoulli numbers. Note that A van- 
ishes below a critical value ( which depends both on the 
amplitude of the weight vector (R) and its phase (<j>) ) 
(3 C = -jgj s [^(j,) > indicating that the system undergoes a 
Hopf bifurcation at j3 c . 

In the more involved case the model consists of a MLN 
- Multi-Layer Network. The solution is a combination of 
perceptron-like SGen solutions. The exact details, how- 
ever, depend on (3 and the specification of the weight 
vectors (for more details see || ). The AD in the generic 
case is bounded by the number of hidden units connected 



to the input layer. Moreover, in M it was shown that the 
typical relaxation time for such a system from an arbi- 
trary initial condition is proportional to the size of the 
delayed input vector. This result is of importance for 
time series prediction by setting a bound on the horizon 
of predictions. 

The problem of noise in a dynamic system is of great 
importance for the behaviour of the system (e.g. stabil- 
ity), and hence its implications on the time series mea- 
sured from that system. In the classical theory of time 
series analysis (linear and non- linear), one is interested 
in the prediction ability of a model when trained with 
noisy data. Since one intends to use a SGen to repro- 
duce noisy data, it is important to understand how noise 
affects the output of a generic SGen. In particular, it 
is crucial that the SGen be robust under the addition of 
noise, which is non-trivial given the non-linear feedback 
dynamics of the SGen. The addition of noise enables us 
to check the stability of the previous results, obtained for 
isolated models. 

As we shall see, the SGen is indeed extremely stable 
in the presence of noise. The noise causes the attractor 
to broaden. Even large noise of order the signal does not 
destroy the attractor. This gives rise to several quanti- 
tative issues. In section III we focus on a perceptron- 



SGen with one Fourier component in its weight vector. 
First, we analyze the scaling with N of the manner in 
which the attractor is broadened due to the noise. This 
quantity manifests the cooperative aspect of the degrees 
of freedom in the system. We show that the broaden- 
ing increases with N as l/VN. Next, we discuss the 
issue of phase coherence (PC). Loss of PC is a generic 
phenomenon for periodic systems perturbed by noise. 
In this section, we analyze the extent to which adding 
noise to the SGen reduces its PC. The analysis is done 
for two types of dynamic rules, namely sequential up- 
dating (described above) and parallel updating (see sec- 
tion ||). We show that the phase behaves as a biased 
random walk process, as typically observed in noisy os- 
cillators, however the diffusion coefficient D exhibits a 
power-law dependency on N. For the sequential (paral- 
lel) rule, D ~ 1/N 2 (1/N). The importance of this result 
is that for large systems, PC is lost only over times that 
scale with the size of the system. This lost of PC also 
leads to a broadening of the dominant component in the 
power spectrum. We observe that this broadening de- 
creases with N, consistent with the decrease of D with 
N discussed above. 

Next, we measure the AD of the broadened attrac- 
tor. As mentioned before, we focus on the classification 
of various SGen's by the long term sequence they pro- 
duce, therefore we are interested in the estimation of this 
quantity. In section ^ we apply standard methods to 
estimate the AD of time series generated by the SGen. 
With no noise added, we of course recover the analytical 
results, e.g. AD — 1 for the perceptron-SGen. The more 
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important question is how the noise added to the system 
influences the measured AD. Our treatment parallels 
that found in the literature of dynamic systems where the 
AD was estimated from a measured (noisy) time series 
taken from chaotic systems or strange attractors [|lO],|ll| . 
We measure the AD of a perceptron-SGen, as well as of 
a Committee Machine whose parameters were chosen in 
such a way that two Fourier components have a non-zero 
coefficient and whose AD, therefore, should equal 2. We 
found that for length scales greater than the typical size 
of the noise and well below the attractor's radius, the AD 
of the SGen does not differ from the expected analytical 
results. 

Finally, in section |v| we analyze the effect of noise on 
a SGen with multiple attractors. While in the non-noisy 
case, the perceptron-SGen exhibits a single stable attrac- 
tor, here we expect transitions between attractors due to 
the noise. We focus on the average time needed to es- 
cape from a basin of attraction, and particularly its de- 
pendence on the sizes of both the system and the attrac- 
tor. This quantity, also known as the mean first passage 
time, has been investigated extensively in the context of 
chemical reactions, dynamical systems etc. |l2|-|l4|]. Ob- 
viously, we are interested in the case of a discrete system. 
This issue has been less treated (see and We 
consider the case of a system governed by two Fourier 
components that results in two attractors. The problem 
of escape time is related to the evolution of the amplitude 
in coupled map equations. The phase portrait of such a 
map suggests that the motion in this phase space can be 
approximated by a one dimensional flow of the form: 



f(x n ) + 



(7) 



where f{x) is a non- linear map and £ is the noise term. 
Following the treatment of Talkner et al. |13|, we relate 
our system to the problem of a discrete dynamics with 
small non-linearity in the presence of a weak noise. The 
analytical result is in a good agreement with extensive 
simulations of the perceptron-SGen for both the polyno- 
mial prefactor and the leading exponential part. 

The results presented herein will primarily focus on 
the perceptron-SGen. Nevertheless we expect that the 
general properties and trends remain true in the more 
general case. 

Summary and a discussion are presented in section jv|. 

II. PRELIMINARIES 

Let us introduce a few concepts which are of general 
use in the following. The basic model is the SGen in its 
simplest form - a perceptron whose output is connected 
to the first input, as described in the previous section. 
This is the sequential updating rule, given by eqs. |^ - ^. 

The sequential scheme can be thought of as a fully con- 
nected network with N + 1 units. The units are updated 



one at a time, i.e. at each time step, another unit plays 
the role of an output unit. The weight matrix connecting 
the units is asymmetric with a certain spatial structure 
where the interactions are only a function of the differ- 
ence between the location of each pair of units (zj): 



mod N+i 



(8) 



where Wq = 0, and Wi (I ^ 0) is the same weight vector 
of the sequential rule. The main diagonal elements are 
zero, and the rest are the same values as the first row but 
cyclically permuted, e.g. for N = 3: 



J = 



Wi w 2 w 3 

W 3 Wi w 2 

W 2 W 3 Wi 

Wi W 2 W 3 



(9) 



This type of weight matrix is said to have a Toeplitz 
structure. To implement the parallel scheme, all the units 
are updated simultaneously with the sequential rule via 
the matrix described in equation |§|: 



N+l 

S* +1 = tanh(/3 Jijtf) 

3 = 1 



(10) 



In the sequential scheme, noise is presented to the sys- 
tem in the following way: 



t + l _ qt , t 



where r\ is distributed according to: 

E[rf] = 0. 
E[rfrf]=cr 2 5 w . 



(11) 



(12) 



In this way, noise is added only to the first unit in each 
iteration of the dynamic rule. In the parallel updating 
scheme, the noise is represented by a vector with N+l 
independent components if, which is added to all units 
simultaneously in each iteration: 



N+l 

Sl +1 = tanh(/3 V JyS*) 



(13) 



As we previously noted, the sequential SGen produces 
a time series which can be denoted by S l , t = 1,2, ... . 
The sequence S* is the basis of the numerical analysis. 
In order to use the rich theory of reconstructing state 
space [HQ, one has to embed the time series in a phase 
space. The process of embedding a time series onto a 
d-dimensional space, generates a set of vectors (or a tra- 
jectory) in that space. The embedded vectors are: 



X t = (S* s S* 



.5' 



t-d+l 



(14) 
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III. PROPERTIES OF A SINGLE ATTRACTOR 

In this section, we analyze the properties of a 
perceptron-like SGen with a weight vector that contains 
a single Fourier component with an arbitrary phase (</>) 
of the form Jj = Rcos {^-Kj — ircp) . When no noise is 
added to the dynamic equation (equation ||) , the generic 
stable solution was found to be a quasi-periodic orbit [^) , 
e.g. Figure 0. 
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FIG. 2. Quasi-periodic orbit generated by a perceptron 
N = 50, K = 17, (3= 1/17, </> = .123. 

When noise is added (equation [l|) , the orbit is broad- 
ened. Nevertheless, the system does not become ergodic 
and the trajectory is confined in phase space. A charac- 
teristic quantity is the noise induced width of the broad- 
ened attractor. In the following we present both quanti- 
tative explanations and measurements of the dependence 
of this quantity on the size of the system N. Next, we 
discuss the important issue of phase coherence. A peri- 
odic system in the presence of noise typically exhibits a 
loss of phase coherence. This is a result of the funda- 
mental invariance of the system w.r.t. time translation, 
so there is no restoring force to a perturbation which in- 
duces a phase shift. As we shall see, this results in a 
broadening of the PS of the time series generated by the 
system. 



A. Attractor broadening 

Let us define the width of the attractor < W > to be 
the average local broadening of the embedded time series, 
see Figure |[ 
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FIG. 3. Same parameters as previous figure but with a uni- 
form noise of amplitude ±0.1 added. 

In this case, we embed the data in a two dimensional 
space and measure the extent perpendicular to the local 
tangent. Having done this for a system of sizes N = 
20, 50, 100, 200, we plot < W > (denoted by < width > 
in the figure) vs. TV in Figure |[ There exists a clear 
power-law scaling between the two quantities of the form 
W oc A/y/N where A is a constant. 
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FIG. 4. The average width of the embedded time series. 
The power-law fit (dashed line) is 0.15/iV a , a = 0.5 ± 0.007 

To understand this scaling law, consider a random vec- 
tor (RV) in a TV-dimensional space. The relevant quan- 
tity is the projection of such a RV on a fixed vector - 
the weight vector J. Denote the output field h as a sum 
of projections resulting from the stable solution vector S 
and the noise vector ff : 



h=J-(S + ff) = x s + x n . 



(15) 



The components of fj are the last -/V noise terms given by 
equation [H]. The output value is then S out — tanh [/3h] . 
In writing equation ^ we neglect contributions from 
noise terms after iterations of the map, as these correc- 
tions are proportional to f3 and so are 0(1/N). This can 
be justified as long as the parameter /3 can be written as 



4 



= (1 + b)0 c 



(16) 



and b does not scale with N. The term x s is ol O(N) 
as this is the exact solution without noise. The term x n 
is the focus of our interest. Since the Tjj components are 
RV's , we can calculate the first two moments of x„: 



N 



E(x n ) = J i E (Vi) = 



(17) 



i=l 



N 



1,3 



h3 



Thus the variance of the noise term is of O(N). The 
geometrical interpretation is that a RV has a projection 
which is of 0(\/~N) on a given direction. Since the pa- 
rameter f3 scales as 1 /N (as long as b in equation 16 does 
not scale with N), we can conclude that the contribution 
of the noise term scales as 1/ y/N, in agreement with the 
numerical results presented above. Note that this results 
hold even for large noise values and are linear with (a). 



B. Phase coherency 

On general grounds, we expect the phase to undergo 
a biased random walk, where the bias represents the fre- 
quency of the unperturbed system. We can measure 
this directly by comparing the phase of the noisy sys- 
tem to that of a noise-free "reference" system. Starting 
from identical initial conditions, the accumulated phase 
in each series is measured. Denoting the accumulated 
phase in the clean/noisy series (subscript c, n) at time t 
by: 

t 

*c(«) = J2 + X ) ~ M*)) <k(O) = 0n(O) (18) 

i=0 

t 

*»(*) = £ + 1) - <t>n(l)) 
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FIG. 5. Relative phases of the embedded vectors. The left 
part of the figure describes a "clean" point surrounded by 
typical noisy points. 

where the phases 4> c (i) , 4> n (i) are the relative phases of 
the i'th clean/noisy embedded vectors w.r.t. an arbitrary, 
but fixed, coordinate system (see Figure ||, ignore the 
left part of the figure). The quantity of interest is the 
expectation value of the squared phase difference defined 
by: 



(A$ 2 (t)) = E [($ c (t) — $„(i)) 2 ] 



(19) 



where < • > stands for the average over all samples 
taken after the same time t. 

An example of the quantity defined in equation [l9] is 
given in Figure [| Clearly, this behaviour indicates that 
the process is diffusive. 
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FIG. 6. Example of the behaviour of the variance of the 
phase difference over time. The slope (dashed line) is the 
linear regression. 

The slope of this figure represents the diffusion coeffi- 
cient. The diffusion coefficient was extracted from data 
of the type represented in Figure ^| for both parallel and 
sequential updating rules. Each data point is an average 
over 400 samples (as in the figure). In each case, the 
simulations were taken at different system sizes. The ex- 
act parameters of each SGen are not important, however 
they were chosen such that the solution is QP and well 
above the critical value (3 C where a bifurcation occurs. 
Each point in Figures is the slope of the linear re- 
gression and the statistical error is less than the size of 
the point. The results from the figures reveal a scaling 
law of the diffusion coefficient D: 

D~l/N a (20) 

where a = 1(2) for the parallel (sequential) rule. 
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FIG. 7. Diffusion coefficient for the parallel rule. The linear 
regression (dashed line) is D = 0.029/iV a with a = 1 ± 0.03. 
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FIG. 8. Diffusion coefficient for the sequential rule, 
linear regression (dashed line) 

D = 0.154/iV a with a = 2 ± 0.036. 
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To understand these results, let us now extend the ar- 
guments that led to the "width" of the noise in the pre- 
vious section. We start with the parallel dynamics and 
develop a relation between (A$ 2 (i)) and time. It was 
shown that the contribution of the noise is of the or- 
der 1/y/N . Examine Figure ^ (its left part) in the 
context of Figures |2|,||. Each point along the clean or- 
bit, is surrounded by a cloud whose typical radius is of 
0(1/ VN). So basically, the distance between one itera- 
tion of the same point in the clean and the noisy series, is 
of O (1/y/N). Since the noise is assumed to be small, the 
phase can be approximated by the distance projected on 
the QP orbit. Hence, the variance of that phase scales 
as 1/N. This explains the result for the scaling law in 
the parallel case. The sequential dynamics has the same 
characteristics, however the time steps should be rescaled 
w.r.t. the parallel dynamics by a factor of 1/N. That is 
the reason for the 1/N 2 scaling. 

One can conclude that phase diffusion indeed occurs 



(as expected), however its associated time scale increases 
with the size of the system in a power-law fashion (equa- 
tion Therefore the system remains coherent over 
increasingly long times as N increases. 

The loss of PC is also manifested in the Fourier domain 
in the broadening of the dominant Fourier component. In 
the unperturbed system, the power spectrum of the sta- 
ble solution/state is characterized by a sharp peak (delta 
function). The noisy system produces a sequence whose 
power spectrum is broadened around the unperturbed 
Fourier component. The larger the phase diffusion con- 
stant -D, the more broadened the dominant component. 
We indeed observe that the broadening decreases with N. 
Figure § depicts the power spectrum of two sequences (of 
the same length) generated by two perceptron-SGen of 
sizes N = 32, 128. The wave number of the single Fourier 
component is K = 7 and the weight vector is produced 
according to: Ji = cos(j^Ki) , i = 1 . . . N . The power 
axis is drawn in a log scale to emphasize the broadening 
effect. 




7.0 
Frequency 

FIG. 9. Broadening of the dominant component in the 
power spectrum. The weight vector consists of one Fourier 
component with K = 7. The systems sizes are N = 32, 128. 



IV. ATTRACTOR DIMENSION 

We have seen that in the case of the perceptron- 
SGen, the noise gave rise to a broadening of the attrac- 
tor. The attractor, nonetheless remained essentially 1- 
dimensional, as a perusal of figures |]j3] immediately ver- 
ifies. This is consistent with the general behaviour of 
simple attractors in the presence of noise. For the case 
of the MLN, we expect the general picture to persist. It 
is however non-trivial to verify this since the attractor 
is higher dimensional. We employ for this purpose the 
tools that have been developed for analysing dynamical 
systems from their time series. Of course, the question 
of attractor dimension is crucial for exploiting these net- 
works for prediction and modeling. 



6 



Many methods were proposed for estimating the AD. 
We just mention the simplest method, which is the "Box- 
Counting" flfifl . In fact, most methods are based on sta- 
tistical estimators for the dimensionality of the attrac- 
tor. We used the Correlation-Integral method, that was 
introduced by Grassberger and Procaccia IP?] (see also 
KMl). In this method the AD, denoted by D, is esti- 
mated by calculating the correlation sum C(r) from the 
data as follows: 



D 



lnC(r) 



hm 



(21) 



where : 



N p 



p i,j=i 



X; — 1, 



(22) 



Xi are the embedded time series vectors (equation 14), 



N p is the number of data points and O is the Heaviside 
step function. 

In practice, the AD is estimated in the so-called scal- 
ing region of the correlation integral, i.e. one has to 
identify a sufficiently large range of lengths scales over 
which the slope is constant. In many cases, the picture 
is not very clear especially when the number of points 
is not large enough, or when certain parameters in the 
algorithm for estimating C(r) are not optimized (e.g. de- 
lay time) [^,|l). We also note that since the data has 
a high degree of correlation, one has to introduce a cut- 
off to exclude points that were generated closer (in time) 
than this value This points have strong correlation 
that affect the correlation dimension which measures the 
correlation between points from different passes of the 
trajectory. We used the first zero of the autocorrelation 
function as a cut-off. 

When the measurements are corrupted with noise one 
can distinguish between two regimes of length scales; one 
dominated by the attractor and the other by the noise. 
This problem was originally investigated by Mizrachi et 
al. JhJ , and Zardecki [p"T| . In the broad sense, one can 
identify four regions p2L Due to the finite number of 
points in the data sample, for very small r, the number 
of points in the sphere of radius r approaches zero, and 
hence also the slope. At larger r, there is a transition 
to a region where the noise dominates. If the number of 
points is large enough, the slope saturates the embedding 
dimension. At yet larger r, one enters the scaling region 
with a constant slope estimating the AD (given that the 
region is large enough). Finally, the slope returns to zero 
as r reaches the attractor's radius. For clarity only the 
second and third regions are shown. 

Let us now describe our measurements. The time series 
were generated for four cases: a perceptron without noise 
(Figure [lO]); perceptron with noise added as described 
in equation [ll] (Figure |l2|); and a Committee Machine 



(CM) with three hidden units with and without noise 
(Figures |Tl| , p^| ) ■ By a CM we mean a two-layered network 
whose second layer weights equal one. Each perceptron 
in the hidden layer (as well as the perceptron-SGen) has 
only one Fourier component in its weight vector and an 
arbitrary phase : 



2tt 



Jj = Rh cos ( jrK h j - TT<p h 



(23) 



where R is an amplitude, N is the input size, K is the 
wave number, is a constant phase shift and h labels 
the hidden unit. ( The case of more than one Fourier 
component is treated in a different context in section |v| 
). The gain parameter (3 in the CM was chosen so that 
the stable attractor of this SGen contains only two com- 
ponents in the power spectrum. This choice produces a 
2D attractor. (The values of all the parameters are given 
in the figure captions). 

The figures present the calculated In C<z (r) . The AD 
is estimated by the local slope, d[ln C2(r)]/d[\n r] and 
presented in the insert. It is important to note that all 
data points are rescaled to the region [0, 1], prior to the 
evaluation of the correlation integral. Figure [l0| presents 
results for the simplest perceptron-SGen with only one 
Fourier component in its weight vector. The arbitrary 
chosen phase shift <p results in a QP orbit || which is 
ID (AD = 1). We embedded the time series in m = 
2, 3 and 4 dimensional spaces. Clearly the measurements 
support the analytical results and the AD measured is 
about AD = 1.01. In Figure |ll|, we present the results 
for the more complicated attractor generated by the CM. 
The expected AD is 2 (as described above). The results 
are slightly above 2, that is 2.0 < AD < 2.03. Notice 
that the embedding in a 2D-space gives a wrong result, as 
expected, since the structure of the attractor is unfolded 
only in a 3D space, at least. 



14.0 




FIG. 10. Perceptron without noise. N = 400, (3 = 1/180, 
4> — 0.2235 R = 1.0. The solid guide line in the insert is at 
1.01. (o m = 2, □ m = 3, O m = 4). 
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FIG. 11. CM that exhibits a 2D attractor, without noise. 
N = 500, /3 = 1/185, = 0.2235,0.3524,0.4244. E4 = 1.0 , 
i — 1,2,3. The solid guide line in the insert is at 2.02. 
(o m = 2, □ m = 3, O m = 4). 

Now we analyze the same perceptron-SGen and CM 
but with noise added (see Figures [l^, |l3|) . We embedded 
the time series as in the non- noisy case in m = 2,3 and 
4 dimensional spaces. We used a uniformly distributed 
noise with an amplitude of ±10~ 2 while the attractor's 
amplitude is bounded by ±0.7(±1.2) for the perceptron 
(CM), prior to rescaling. Our results are similar to other 
noisy dynamic systems |1(J in the sense that for length 
greater then the characteristic noise scale, the measured 
AD saturates the true dimensionality, i.e. in this case 
AD = 1(2) (as in the non- noisy case). However below 
that scale, the noise dominates and since in general it 
fills the space in all dimensions, the slope increases with 
the embedded dimension. In our case, the slope measured 
for the noise is correct only in m = 2, 3, while in higher 
dimensions it is lower than the embedded dimension. The 
reason for this inaccuracy is that we have not used enough 
points so the space was not filled densely by the noise. 
The results are AD ~ 1.01(2.07) which are slightly higher 
than the non-noisy case for the CM. 
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FIG. 12. Perceptron with noise added. N = 100, (3 = 1/40, 
4> — 0.2235, R — 1.0. The solid guide line in the insert is at 
1.01. (o m = 2, □ m = 3, O m = 4). 
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FIG. 13. CM that exhibits a 2D attractor, with noise 
added. 

N = 500, (3 = 1/185, 4>i = 0.2235,0.3524,0.4244. R4 = 1.0 
, i = 1,2,3. The solid guide line in the insert is at 2.07. 
(o m = 2, □ m = 3, O m = 4). 

In all the figures, one can easily identify the scaling 
region which is quite broad, more than an order of mag- 
nitude of length. The conclusion from the results is that 
the SGen maintains its AD in the presence of noise. The 
effect of noise is bounded to small length scales, as ex- 
pected. 



V. ESCAPE FROM A META-STABLE 
ATTRACTOR 

So far, we have discussed several properties of the dy- 
namics in the neighborhood of a single attractor. This 
section is devoted to the analysis of the dynamics when 
there are multiple attractors. In particular, we focus on 
the average time to escape from the domain of one of the 
attractors. The picture one should have in mind is of sev- 
eral states having local stability with transitions between 
them induced by noise. 

In the first section we derive an analytical result for 
the mean first passage time (MFPT) in the limit of weak 
noise and a weakly non-linear map. The reasons for tak- 
ing these limits will be explained later. The second sec- 
tion describes a series of simulations which support the 
analytical results. 



A. The mean first passage time in periodic attractors 

In the following we analyze the case where each of the 
meta-stable states is characterized by an N-states peri- 
odic attractor. This property is achieved by setting the 
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phase shift <j> (e.g. equation 23) to zero @j. In order 
to keep the discussion as simple as possible, let us re- 
strict ourselves to the case of a perceptron-SGen with 
two Fourier components in the power spectrum of the 
weight vector. Hence, the weight vector is given by: 



■h 



2 2 7T 
^2 R m cos(—K m j) 



N' 



(24) 



1. A simplified model 

The key point is our ability to identify a low dimen- 
sional discrete dynamics that describes the evolution of 
the solution, and relate it to our problem of the SGen. 
In H it has been shown that the general solution for a 
perceptron-SGen with a weight vector defined by equa- 
tion E4l is of the form: 



27T 

s* = tanh[y^ A m cos( -jrk m t )] 



rn—1 



N 



(25) 



This solution leads to self-consistent coupled equations 
for the amplitudes of the dynamic solution: 

00 P ~ 1 ( An\2t+1( An \lp-1t-1 



p=\ 



(26) 



where n labels the discrete time, C(p) = (2 2p — l)i? 2p /p 
(Bp are the Bernoulli numbers), and m' = 2 for m — 1 
and vice-versa. 

In the absence of noise, the coupled equations evolve 
into one of the two fixed points (f.p.'s) in which only 
one of the Fourier components has a non- vanishing coef- 
ficient. The addition of noise, as described in equation 
O, generates a perturbation in each of the coupled equa- 
tions. The perturbation can "kick" the system out of the 
vicinity of one stable f.p. so that it escape to the other 
f.p. We are interested in the mean time for such an event 
to occur. 

We assume Ri = ife = 1, i.e. the symmetric case. In 
order to continue, we truncate and transform the coupled 
equations (equation p6| ). For small amplitudes, one need 
keep terms only up to third order. The result becomes: 



Alt 1 = 4 



\{A n m f 



-a: 

2 r ' 



{A n m ,f 



(27) 



where, as before, m' = 2 for m = 1 and vice-versa. One 
can treat these equations as a recursive solution for the 
amplitudes of the dynamic solution. In this sense, equa- 
tions |27] become discrete dynamic equations. For no- 
tational convenience we shall relabel the variables with 



A\ — ► x and A2 — » y. In addition, we introduce the 
reduced variable r as follows: 



0- 



.N 



1 



(28) 



where f3 c = 2/N. This redefinition allows us to rewrite 
equation ^ as an iV-independent map: 



x n+ i = (1 + t) 



2 x nU r , 



(29) 



The second equation is obtained by replacing x by y and 
vice-versa, x <^> y . 

Analysis of these equations under the assumption that 
r <C 1, gives four symmetric f.p.'s, namely: y* — 
, x* = ±2y/r and vice-versa. These f.p.'s are sta- 
ble and we consider only the positive ones. In addition, 
we have a trivial unstable f.p. and four saddle points at 



j sp 3 y sp 



± 1 2V3 



3^ V^" • ^ typical phase portrait of this map 
is depicted in Figure [li] (actually, only the positive quad- 
rant is shown). The stable f.p.'s are at x* = 0.2 , y* = 
and y* = , x* = 0.2 ( the other two symmetric 
f.p.'s are not shown ). The saddle point shown is at 
x tpi vfp ~ - 2 ^ , whereas the other three are not shown. 
Let us denote this point by SP + , and with SP~ 
denoting the other saddle point, i.e. (x+ , y~) . 
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FIG. 14. Phase portrait of the 2D map with r = 0.01. SP 
denotes a saddle point. 

The boundary between the two domains of attraction is 
clearly the line x = y. The additive noise, as mentioned 
above, perturbs these equations and as a result the sys- 
tem may escape from the domain of attraction defined 
by x > y. The random time it takes for the system to 
reach the state x = y is the first passage time stochastic 
variable. Note that the additive noise described in equa- 
tion ^ is not the same one used in our model here, since 
the first noise is applied directly to the SGen, while the 
second is the effect of such noise on the amplitude of the 
solution. The connection between the following model 
and the SGen is given in the next subsection. 
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The model for the perturbed system, is described by 
the following 2D noisy map: 



= (1 + t) 



1 3 _ 1 2 



in- 



(30) 



where £ n is a Gaussian additive noise distributed accord- 
ing to: 



(31) 



i 



l 



x nVn 



Sn+1 = S n - TU'(s n ) + 



(34) 



where s defines the path. The noise term now is the 
tangential projection of the 2D noise on the path. The 
path s can be found by writing an equation for x(y) on 
the path, however the relation is implicit and cannot be 
used directly. 

This type of a ID equation has been investigated for 
the case of small non-linearity Jl3|Jl5|,Q , namely, the 
class of map functions with the property that f(x) devi- 
ates only weakly from the identity map: 

dU(x) 



dx 



T < 1 



(35) 



The analogy with our ID map (e.q. p4| ) is obvious. In 
the next section, we adapt the derivation of j24| to our 
map. 



2. MFPT analysis 

In the following, we sketch the calculation of the 
MFPT for the process defined in equation |i] The com- 
plete derivation and simulations will be given in [ p3f. 

Assume that the process described in equation |34| is 
defined in (— oo, oo) and define the random variable i(s), 
the first passage time from the interval I = [SP~ , SP + ] 
, by: 



The map for y is obtained in the same manner as in 
equation Note that due to the mutual independence 
of the £„, the process defined in equation |30| is a Markov 
process. 

The region of interest is of 0{y/r). Following an ap- 
propriate rescaling of equation pG, we get: 



t = min{n : \s n \ > sf} . 



(36) 



i.e. the first time the process hit one of the boundaries, 
where SP^ are the saddle points defined above, and sf p 
is the value of s at the saddle point. The MFPT, i(s), 
starting from a point in / is given by: 



l n . t(s)=<t(s)>=E[t\S = s]. (37) 

(32) ^ was sri0wn that the MFPT can be written as (e.g. p4j): 



where x n ,y n ,£, n are the rescaled variables. We further 
rewrite the map in the following way: 

f(x n ,y n ) = x n - TU'(x n ,y n ), (33) 

The derivative is taken with respect to x or y, depends 
on the variable for which the map is written for. 

Say the initial condition is y = , x = x*, i.e. one of 
the f.p.'s. Since the line connecting this f.p. and the sad- 
dle point is a valley, we may assume that the most prob- 
able escape route is along this line (or its mirror through 
the x-axis, i.e. the line connecting the f.p. with the sad- 
dle point (x~%p, yj p ) ). This argument can be understood 
by rotating each noise term tangent and perpendicular 
to the path. The perpendicular term decays fast due to 
the restoring force, hence we can conjecture that the dy- 
namics is mainly ID. Therefore, with the assumption of 
weak noise and r <C 1 we can reduce the map into one 
dimension, on that path (for details, see p3| ). Hence, a 
ID noisy map is obtained: 



t(s) - 1 



P{z\s)t{z)dz 



(38) 



where P{z\s) denote the transition probability to go 
from s n — s to s n+ \ — z in a single step. Under the as- 
sumption of weak noise e <C 1, the function t(s) is nearly 
constant inside the domain of attraction. Fluctuations 
occur mainly near the boundary. The reason is that only 
close to the boundary may one have a finite probabil- 
ity to jump over the boundary in small number of steps. 
Therefore, it was suggested [15 23] that this function be 
written as a product of a constant value, and a boundary 
layer function: 



t(s) = Th{s) , h(s*) = 1 



(39) 



where s* is the f.p. 



The boundary layer extends a dis- 
,+ ' 



tance of order e 1//2 around s = s+,, and we can write the 
scaled boundary layer function h(s), h(s) — /i((2e) 1 / 2 s). 
Inserting this assumption in equation [38] gives an integral 
equation for h(s). This equation was analytically solved 
by Talkner et al. @ and by Knessl et al. @. The 
leading exponential part of the solution of this equation 
gives: 



T tx exp 



^(U(SP+) 



(40) 



The potential difference has been calculated analytically 
(see ||) and found to be U(SP+) - U(s*) = ±. The 
prefactor is obtained from integrals involving the bound- 
ary layer function. The final result for the MFPT reads: 



T 



exp 



2r 
3 e 



(41) 



with a constant. 
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Simulations of our 2D model (equation |3|) are shown 
in Figure |l5|. The reduced variable r is varied for different 
noise amplitudes e. The results are in excellent agreement 
with the prediction of the ID theory (equation |l]). 
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FIG. 15. Scaling of the average logarithm of the escape 
time in the 2D model. The solid line is a linear regression 
and its slope is 0.658 ± 0.003. 

In the next sub-subsection we present the results from 
extensive simulations of the real system, i.e. the SGen. 



B. Numerical simulations 

Measuring the MFPT directly from the time series gen- 
erated by the noisy SGen is impossible, since there is no 
way to distinguish between the different attractors. The 
natural variable which does measure the projection of the 
current state on each attractor is the relative amplitude 
in the power spectrum of the input vector. Note that 
there exists an equivalence between the amplitudes of the 
solution to the coupled equations (equation |2^) and the 
amplitude in the power spectrum of the corresponding 
Fourier components. 

We study an SGen with a weight vector containing 
two Fourier components, as described at the beginning 
of the previous section, with no phase shift and both 
amplitudes equal to one, R± = R2 = 1. We applied the 
sequential updating scheme described in section [H] with 
a noise which is normally distributed Af(fi = 0, a = 0.1). 
We set the initial conditions for each run to one of the 
Fourier components. In each experiment we measure the 
number of iterations before the amplitudes of the two 
components in the power spectrum of the input vector, 
become equal. As we expect an exponential behaviour of 
this quantity, we record the logarithm of the first passage 
time. We found that actually the average logarithm of 
the median first passage time has smaller variations than 
the average logarithm of the first passage time over all 
the data set. Each pair (N, r) was tested 200 — 400 times 
and the first passage time was recorded. The list of times 



was divided into 10 groups and the average logarithm of 
the median from each group was taken. Finally, we end 
up with 10 values from which we calculated the first and 
second moments. 

Figure |l^ depicts the ensemble of all experiments, in 
which we varied the size of the system in the range 
200 < N < 1500 and the reduced variable t in the range 
0.003 < t < 0.04 . To demonstrate the scaling proper- 
ties, we plot the average logarithm of the median escape 
time as a function of rN a . The worst error of the data 
points is about the size of the symbol, hence errors were 
omitted for clarity. Clearly, the average median time to 
escape follows the relation: 



(tn 



aN 



exp(fer7V a ) 



(42) 



where a, b, a are constants (given below). 
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FIG. 16. Scaling of the average logarithm of the median 
escape time. The solid line is a linear regression and its slope 
is 11.2 ±0.12. 

In order to appreciate this result, we need an appro- 
priate variable transformation. Recall that in previous 
sections we saw that the projection of the noise scales 
as l/y/~N, hence its second moment scales as 1/N. 
On the other hand, correlations between the noise terms 
might affect this scaling, therefore the exponent should 
be brN a = r/e , where a < 1. Our simulations show 
that a w 0.5. Also note that b increases linearly with 
a 2a . The prefactor is affected by the nature of the se- 
quential scheme, i.e. the fact that time is rescaled. As 
expected, it was found that the polynomial increase in 
the MFPT is linear with the size of the system, with 
a w 0.07. The constant slope in the exponent (b) found 
from simulations is 11.2, while the prediction given by 
the model is » 9.4. 
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VI. DISCUSSION 

In this work we have studied the time series generated 
by a noisy Sequence Generator (SGen). We have focused 
on the robustness of the isolated analytical results in the 
presence of noise, the issue of phase coherence and escape 
time from a meta-stable attractor. 

Although the system does not becomes ergodic in the 
presence of noise, the attractor is broadened. We have 
analyzed this phenomena for the case of a perceptron- 
SGen and found that the attractor in phase space is in- 
versely broadened as y/W. Nevertheless, it is clear that 
this result is applicable to more complicated architectures 
as well. 

Analysis of the phase coherence is highly important 
in quasi-periodic complex time series since, in general, 
merely identifying the governing frequencies in the sys- 
tem is insufficient. To investigate this phenomena, we 
have analyzed the behaviour of the diffusion coefficient. 
It is related to the divergence with time of the variance 
of the phase error. For uncorrelated noise, we show that 
the diffusion coefficient should scale inversely with N. In 
order to test this argument numerically, we used two up- 
dating schemes. The parallel scheme fits exactly to our 
model. For the sequential scheme the diffusion coefficient 
scales as 1/N 2 since the time is rescaled by 1/N. Never- 
theless, the conclusion is the same, namely, coherence is 
indeed maintained for time length which scales less than 
linear with the size of the system, i.e. t ~ N a (a < 1) 
for large N. The loss of phase coherence is also mani- 
fested in a broadening of the dominant component in the 
power spectrum in the same manner, namely, the larger 
N is, the sharper the dominant component. 

We have calculated numerically the attractor dimen- 
sion from time series that were generated by SGen's for 
both cases (noisy and isolated), for the perceptron as 
well as for a multi-layer network. The results for the 
noisy/isolated system are very similar and in agreement 
with the analytical results obtained for the isolated sys- 
tem H, i.e. the attractor dimension does not change in 
the presence of noise. This result is, of course, not sur- 
prising from the point of view of dynamical systems, as 
described in section IV. 



When the noise interacts with a system that consists of 
more than a single attractor, one distinguishes between 
two time scales. In the short term, the system is still sta- 
ble with respect to the previous results, namely one can 
work within the framework of a single attractor. How- 
ever for large times, fluctuations take over and the system 
may escape from the initial basin of attraction. We have 
developed the theory for the mean first passage time to 
escape an attractor defined by a Fourier component in the 
power spectrum of the weight vector. For this analytical 
investigation, proper variables were identified. These are 
the amplitudes of the solution to the unperturbed sys- 



tem. Without noise, we found that these variables are 
connected via coupled equations, however, in the generic 
case only one variable has a stable non-zero value (above 
bifurcation) . Adding noise to the dynamics perturbs this 
solution. We have focused on the case of two symmetric 
attractors. In the limit of small noise and not far from 
the bifurcation we were able to reduce the dimensional- 
ity of the dynamics into a ID flow. This manipulation 
allows us to use the theory developed for discrete dynam- 
ics driven by noise. The results resemble those obtained 
in systems with potential barrier undergoing a tunnel- 
ing in the sense that the escape time has a polynomial 
prefactor and a leading exponential term. We defined a 
reduced variable r (equation ^8|) which is closely related 
to the amplitude of the solution. This quantity plays the 
role of the potential gap. Simulations of the SGen with 
two symmetric attractors have shown that our theory, 
and especially the reduction to a ID flow, are correct. 
The small corrections to the theory are due to the corre- 
lations between the noise terms in the sequential scheme, 
while in the theory we assumed uncorrelated noise. In 
order to complete the picture we still have to solve the 
non-symmetric case, and to extend it to more than two 
attractors (details will be given in p3[). However, we ex- 
pect that as long as the number of significant attractors 
does not scale with the size of the system, this theory 
can provide a good explanation. Further extensions can 
also be made to the multi-layer network. 

Although this analysis was applied to a perceptron- 
SGen, it is reasonable to expect that the general proper- 
ties remain valid in the case of a generic two-layer net- 
work where each perceptron-SGen exhibits its attractors. 
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